Abstract: Map Reduce implementation have mainly designed for homogeneous clouds, the data centres have been driven towards heterogeneous clouds due to the evolving application of hybrid clouds, geo distributed clouds and networking and storage devices. Since Map Reduce implementations are designed for heterogeneous cloud their performance is low in homogeneous clusters. In this paper we mainly present an extended study on three kinds of factors -System configuration, task scheduling for the process of resource utilization. Here we come under conclusion with 3 key findings. The performance of Map Reduce job will affect the performance map and reduce tasks when running on the different nodes. The job performance and resource utilization efficiency can be improved by scheduling the map and reduce tasks dynamically according to the capacity of nodes and prior knowledge about workload. When the shuffle data is large in size it can the random scheduling of reduce task will degrade the performance of homogeneous cluster even though it performs well in heterogeneous clusters.
Keywords: Hadoop, Map Reduce, Heterogeneous clusters, Cloud.